🎉 Multiple Papers of Our Team Have Been Accepted by CVPR 2024
CVPR 2024 officially released the list of accepted papers. We are thrilled to announce that multiple papers from our team have been included!
Paper 1: GP-NeRF
📄 GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Authors: Hao Li, Dingwen Zhang, Yalun Dai, et al.
Conference: CVPR 2024
Research Background
Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular. Most existing methods treat semantic prediction as an additional rendering task, i.e., the "label rendering" task, to build semantic NeRFs.
However, by rendering semantic/instance labels per pixel without considering the contextual information of the rendered image, these methods usually suffer from unclear boundary segmentation and abnormal segmentation of pixels within an object.
Key Contributions
To solve this problem, we propose Generalized Perception NeRF (GP-NeRF), a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework, for facilitating context-aware 3D scene perception.
Main innovations include:
- We introduce transformers to aggregate radiance as well as semantic embedding fields jointly for novel views and facilitate the joint volumetric rendering of both fields.
- We propose two self-distillation mechanisms: the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field and the maintenance of geometric consistency.
Experimental Results
We conduct experimental comparisons under two perception tasks (semantic and instance segmentation) using both synthetic and real-world datasets. Notably, our method outperforms SOTA approaches by:
- 6.94% on generalized semantic segmentation
- 11.76% on finetuning semantic segmentation
- 8.47% on instance segmentation
Paper 2: LTGC
📄 LTGC: Long-Tail Recognition via Leveraging Generated Content
Authors: Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu
Conference: CVPR 2024
Research Background
Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories.
Key Contributions
In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content.
Main innovations include:
- Inspired by the rich implicit knowledge in large-scale models (e.g., large language models, LLMs), LTGC leverages the power of these models to parse and reason over the original tail data to produce diverse tail-class content.
- We propose several novel designs for LTGC to ensure the quality of the generated data and to efficiently fine-tune the model using both the generated and original data.
Experimental Results
The visualization demonstrates the effectiveness of the generation module in LTGC, which produces accurate and diverse tail data. Additionally, the experimental results demonstrate that our LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks.
Conclusion
These acceptances at CVPR 2024 represent significant milestones for our research team. Both papers tackle important challenges in computer vision - one focusing on 3D scene understanding with NeRF, and the other addressing the long-tail recognition problem with generative models.
Congratulations to Hao Li, Qihao Zhao, and all co-authors for these outstanding achievements! 🎊